A Semantic Similarity Approach to Electronic Document Modeling and Integration
نویسندگان
چکیده
The World Wide Web is an enormous collection of information resources serving for various purposes. However the diversity of the Web information as well as the relared formats makes it vety difficult for users to efficiently search and obtain the information they require, The reason for the difficulty is because most of the information uploaded to the Web is unstructured or less structured. Many metadata models are proposed to response to this problem. These metadata models attempt to provide a certain kind of general description for the Web infimnation to improve its structuredness. Although these documents consist in a largest portion of the Web informarion or Web resources, few metadata models are dealing with the ill-structured Web documents through analyzing their semantic relations with each other. In this paper we consider this large portion of the Web information, called electronic documents. We propose a metadara model, called EDM (Electronic Document Metadata Model). Using the metadata model we can extract semantic characteristics from electronic documents and then use the characteristics to form a semantic electronic document model. This model, inverseiy, provides a basis for analysis of semantic similarity between electronic documents and for the electronic document integration. The document modeling and integration will support further manipulations on the electronic documents, such as exchange, search, and evolution.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملTransition Potential Modeling of Land-Cover based on Similarity Weighted Instance-based Learning Procedure and Its Implication in the REDD Project Design Document
Reducing Emissions from Deforestation and Forest Degradation (REDD) is a climate change mitigation strategy employed to reduce the intensity of deforestation and GHGS emissions. In recent decades, drastic land use changes in Mazandaran province caused a substantial reduction in the amount of Hyrcanian forests. The present research based on objectives of REDD projects paid to identify of fore...
متن کاملA procedure for Web Service Selection Using WS-Policy Semantic Matching
In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...
متن کاملFrom digital libraries to electronic catalogues for engineering and manufacturing
Over the last decade, a formal data model of libraries of parts for manufacturing and engineering has been developed. This model, known as PLIB, officially ISO 13584, is suitable not only for the exchange of files containing parts, independent of any application that is using these files, but also as a basis for implementing and sharing databases of parts library data. More recently, a strong r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000